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Abstract 

We describe a new approach for dealing with the fol- 
lowing central problem in the self-organization of a ge- 
ometric sensor network: Given a polygonal region R, 
and a large, dense set of sensor nodes that are scattered 
uniformly at random in R. There is no central con- 
trol unit, and nodes can only communicate locally by 
wireless radio to all other nodes that are within com- 
munication radius r, without knowing their coordinates 
or distances to other nodes. The objective is to develop 
a simple distributed protocol that allows nodes to iden- 
tify themselves as being located near the boundary of 
R and form connected pieces of the boundary. We give 
a comparison of several centrality measures commonly 
used in the analysis of social networks and show that 
restricted stress centrality is particularly suited for ge- 
ometric networks; we provide mathematical as well as 
experimental evidence for the quality of this measure. 

1 Introduction 

In recent time, the study of wireless sensor networks 
(WSN) has become a rapidly developing research area 
that offers fascinating perspectives for combining tech- 
nical progress with new applications of distributed com- 
puting. Typical scenarios involve a large swarm of 
small and inexpensive processor nodes, each with lim- 
ited computing and communication resources, that are 
distributed in some geometric region; communication is 
performed by wireless radio with limited range. As en- 
ergy consumption is a limiting factor for the lifetime of a 
node, communication has to be minimized. Upon start- 
up, the swarm forms a decentralized and self-organizing 
network that surveys the region. 

From an algorithmic point of view, the characteristics 
of a sensor network require working under a paradigm 
that is different from classical models of computation: 
Absence of a central control unit, limited capabilities of 
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(a) 60,000 sensor nodes, distributed uni- 
formly at random in a polygonal region. 




(b) A zoom into (a) (c) A further zoom 

shows the commu- into (b) shows the 

nication graph. communication 
ranges. 

Figure 1: Scenario of a geometric sensor network, ob- 
tained by scattering sensor nodes in the street network 
surrounding Braunschweig University of Technology. 

nodes, and limited communication between nodes re- 
quire developing new algorithmic ideas that combine 
methods of distributed computing and network pro- 
tocols with traditional centralized network algorithms. 
In other words: How can we use a limited amount of 
strictly local information in order to achieve distributed 
knowledge of global network properties? 

This task is much simpler if the exact location of each 
node is known. Computing node coordinates has re- 
ceived a considerable amount of attention. Unfortu- 
nately, computing exact coordinates requires the use 
of special location hardware like GPS, or alternatively, 
scanning devices, imposing physical demands on size 
and structure of sensor nodes. As we demonstrated in 
our paper U, current methods for computing coordi- 
nates based on anchor points and distance estimates en- 
counter serious difficulties in the presence of even small 
inaccuracies, which are unavoidable in practice. 

As shown in 3 , there is a way to sidestep many of the 
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above difficulties, as some structural location aspects do 
not depend on coordinates. This is particularly relevant 
for sensor networks that are deployed in an environment 
with interesting geometric features. (See [S] for a more 
detailed discussion.) Obviously, scenarios as the one 
shown in Figure 1 pose a number of interesting geo- 
metric questions. Conversely, exploiting the basic fact 
that the communication graph of a sensor network has 
a number of geometric properties provides an elegant 
way to extract structural information. 

One key aspect of location awareness is boundary 
recognition, making sensors close to the boundary of the 
surveyed region aware of their position and letting them 
form connected boundary strips along each verge. This 
is of major importance for keeping track of events enter- 
ing or leaving the region, as well as for communication 
with the outside. Neglecting the existence of holes in the 
region may also cause problems in communication, as 
routing along shortest paths tends to put an increased 
load on nodes along boundaries, exhausting their en- 
ergy supply prematurely; thus, a moderately-sized hole 
(caused by obstacles, by an event, or by a cluster of 
failed nodes) may tend to grow larger and larger. 

We show that using a combination of geometry, 
stochastics, and tools from social networks, a consid- 
erable amount of location awareness can indeed be 
achieved in a large swarm of sensor nodes without any 
use of location hardware. The result is a relatively sim- 
ple distributed algorithm for boundary recognition in 
large geometric sensor networks that shows excellent 
performance for test networks with 80,000 nodes. 

2 Centrality Measures for Social Networks 

A different area studying large and complex graphs is 
the field of Social Networks, where nodes represent indi- 
viduals in a large collective, and edges indicate some in- 
teraction between them. (See the recent book ^ for an 
overview and an extensive list of references.) Identify- 
ing asymmetries within a network is a natural approach; 
one particular way of doing this is based on so-called 
centrality indices, i.e., real- valued functions that assign 
high values to more "central" nodes, while "boundary" 
nodes get low values. 

In the last five decades, many different centrality in- 
dices have been proposed. There are two major classes: 
One is based on local properties of the graph, so it is 
particularly suited for typical scenarios of sensor net- 
works and will be discussed in some detail. The other 
class is based on more global properties, e.g., the com- 
putation of eigenvalues of the adjacency matrix, so it is 
less useful for our purposes. 

Centrality indices of the first class can be subdivided 
into three subclasses: The first considers the distances 
to other vertices, the second determines the number of 




Figure 2: fc-hop neighborhood for fc=4. 

vertices at a given distance, while the third makes use 
of shortest paths containing a given vertex. 

Considering the maximum distance to another vertex 
in the graph (based on hop-count) does not reflect local 
topological structures in a sensor network; in particular, 
it fails to indicate closeness to interior boundaries. The 
size of the fc-hop neighborhood is better suited, and (for 
the simple choice k = 1) was indeed the basis for our 
approach described in as it is an indicator for the 
size of the intersection of the communication range of a 
node with R. It is tempting to try to improve the results 
by increasing k, but this is not without drawbacks with 
respect to topological properties, as a boundary node 
close to a "thick" part of R may get a better value than 
an interior node that is located in a "thin" part of the 
region. See Figure |21 for a scenario with 80,000 nodes; 
index values are represented on a color scale from dark 
(low) to hght (high). 

This leaves the structure of shortest paths. In par- 
ticular, the stress centrality stress{v) is defined as the 
number of shortest paths containing v: 

stress{v) X! X! '^«t('^)' (1) 

where ast (v) denotes the number of shortest paths con- 
taining V. Only considering vertices within a given dis- 
tance S yields the restricted stress centrality: 

stress{v,S) := ^ ^ (Jst{v). (2) 

In the context of a communication network, this mea- 
sure can be motivated as follows: If each vertex sends a 
message to every other vertex along all shortest paths, 
the stress centrality counts how many times vertex v is 
busy with passing on a message. As there may be quite 
many shortest paths, it is reasonable to assume that a 
vertex sends a message to some other vertex and uses 
any of their shortest paths with the same probability, 
i.e., 1/(7 St, where ast denotes the number of shortest 
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(a) Betweenness centrality. (b) Stress centrality. (c) Restricted stress centrality with 

threshold filter. 

Figure 3: Performance of different centrality measures, shown for a scenario of 80,000 nodes distributed uniformly 
at random. 



paths between s and t. The probability of any vertex 
V that it has to transport the message is thus given by 
Pst{v) '■= S^iii. The betweenness centrality betw(v) is 
defined as the sum over all pst{v): 



hetw{v) 



(3) 



See Figure 3(a) for the evaluation of betweenness cen- 
trality for our example, while Figure 3(b) shows the 
stress centrality. (Again, low values are indicated by 
dark dots, while high values are represented by light 
color.) A detailed analysis for restricted stress central- 
ity is given in the following section. 

3 Using Restricted Stress Centrality 

In the context of a sensor network, it takes a number 
of algorithmic steps to evaluate a measure and use the 
results for extracting global features like boundaries. 
Some of those details are described in our paper 
and can be used analogously for other measures: Using 
an auxiliary tree structure (which is easy to obtain), we 
can aggregate local results globally in order to deter- 
mine appropriate threshold values. Once a threshold 
has been set, it can be distributed to all nodes in the 
network; after that, each node simply checks whether 
its centrality index is above or below the threshold, re- 
sulting in a classification as "interior" or "boundary" . 
A good index must have the following properties: 

• It should require only simple local computations for 
each node. 

• Setting a good threshold value should be relatively 
easy. In other words: The distributions for inte- 
rior nodes and for boundary nodes should be well- 
separated. 



Theorem 1 Using the restricted stress centrality 
stress{v, 1), nodes are classified correctly with high prob- 
ability for sufficiently large node density. 



See Figure 3(c) for the result for restricted stress cen- 
trality for relatively moderate density: It can be seen 
that all boundary nodes are correctly classified. The 
interior contains a number of false positives, which can 
be eliminated by additional filters. 

Discussion of Theorem 1. Let u be a node in 
the network, and let 6{v) be the number of neighbors 
of V. Furthermore, stress{v, 1) is the number of non- 
adjacent neighbors of v. Then the normalized coef- 
ficient st{v) :— '^s(yj^s{v)-i) describes the fraction of 
pairs of neighbors that are nonadjacent, i.e., that have 
a shortest-path connection via v, so 'E[stress{v , 1)] = 



E[st{v)] 



E[S{v)] 



Now consider any neighbor w of 



V. Let C{v) := {p € R \ d(j), v) < r} be the portion of R 
that is within communication range of v. See Figure 01 
let := C{v) n C{w), and M„ := Civ) \ C{w). For 
a uniform random distribution, the expected fraction of 
neighbors of v that are not adjacent to w corresponds 

to the ratio of areas -r^^^rrz- Integrating over all pos- 
Ar{c(v)) t> t> f 

sible positions of w, we get an overall expected value 



dw. 



H \ — 1 f / Ar(M^^ 

^ ~ Ar{c{v)) J-w&Civ) \Anc{v)), 

As the size of the areas also depends on the dis- 
tance s of w from the boundary, solving this inte- 
gral in closed form for all s would require finding a 
primitive that contains d as an explicit parameter; 
this appears to be hopeless, even using ideas as de- 
scribed in 121 . However, for specific values of s, an 
explicit numerical calculation is possible: For s > 
r — I and d{w,v) ~ x the area of turns out 

8( arccosf § ) — i sinf 2 arccosf § ) } } , . 

to be % ^ \2JJ1^ The resulting ni- 
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V. 



Figure 4: For any given neighbor w of v, the expected 
fraction of neighbors of v that are not neighbors of w is 
given by 



(a) Distributions for neighborhood size 20. 



|AfuM| • 



2(arccos(§) — i sin(2 arccos(§ ) ) ) 



dx 



tegral a = x yl 

can be solved numerically, resulting in a value of cr = 
0.4134966716. 

For determining threshold values for separating inte- 
rior and boundary values of st, we also need the random 
distribution of st for different values of s. These distri- 
butions can be determined with additional numerical 
computations; using a Monte-Carlo simulation, we ob- 
tained distributions like the ones in Figure |H Shown 
are the distributions for 20 expected neighbors 1 5(a) I 
and for 200 expected neighbors 1 5(b) I; the left (red) 
curve shows the distribution of st for a node v on the 
boundary, while the right (green/blue) curve shows the 
distribution completely in the interior of R. The prob- 
ability of error for a specific threshold is given by the 
normalized area to the right of the threshold below the 
left curve (false negatives) or by the normalized area 
to the left of the threshold below the right curve (false 
positive). Clearly, the error becomes arbitrarily small 
for large neighborhood size. □ 

For intermediate sizes as the one in our example, 
choosing a relatively large threshold value avoids too 
many false negatives, at the expense of a limited ratio 
of false positives. 



4 Algorithm 

In P], we showed how to estimate E[(5(w)] for a node v 
of boundary distance s > r, i.e., a node on the inside of 
the network. The algorithm constructs a tree, collects a 
node degree histogram and floods the result to all nodes. 
Both the total runtime of the algorithm and the total 
size of messages is OdV^I log^ |T^|). Each node stores a 



constant threshold value < 
in advance. If 



< a that has been chosen 



st{v) < 



E[5iv)] 
2 



the node declares itself to be a boundary node. In ex- 
periments, we found = 1/3 to be a particularly good 
choice. 




(b) Distributions for neighborhood size 200. 

Figure 5: Random distribution of restricted stress cen- 
trality for a node on the boundary and in the interior, 
for different neighborhood sizes. 

5 Conclusion 

We showed that restricted stress centrality is a useful 
index for extracting topological boundary information 
from a geometric sensor network, provided that the dis- 
tribution of nodes follows a suitable random distribu- 
tion. As this is a rather strong assumption, it appears 
desirable to come up with more general methods. More- 
over, an approach based on random distributions may 
still fail in some rare cases (even though the probability 
of failure is extremely low), so it is particularly inter- 
esting to develop deterministic methods for boundary 
recognition. Such an approach is described in our forth- 
coming paper 
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